33 research outputs found

    Normally-Off Computing Design Methodology Using Spintronics: From Devices to Architectures

    Get PDF
    Energy-harvesting-powered computing offers intriguing and vast opportunities to dramatically transform the landscape of Internet of Things (IoT) devices and wireless sensor networks by utilizing ambient sources of light, thermal, kinetic, and electromagnetic energy to achieve battery-free computing. In order to operate within the restricted energy capacity and intermittency profile of battery-free operation, it is proposed to innovate Elastic Intermittent Computation (EIC) as a new duty-cycle-variable computing approach leveraging the non-volatility inherent in post-CMOS switching devices. The foundations of EIC will be advanced from the ground up by extending Spin Hall Effect Magnetic Tunnel Junction (SHE-MTJ) device models to realize SHE-MTJ-based Majority Gate (MG) and Polymorphic Gate (PG) logic approaches and libraries, that leverage intrinsic-non-volatility to realize middleware-coherent, intermittent computation without checkpointing, micro-tasking, or software bloat and energy overheads vital to IoT. Device-level EIC research concentrates on encapsulating SHE-MTJ behavior with a compact model to leverage the non-volatility of the device for intrinsic provision of intermittent computation and lifetime energy reduction. Based on this model, the circuit-level EIC contributions will entail the design, simulation, and analysis of PG-based spintronic logic which is adaptable at the gate-level to support variable duty cycle execution that is robust to brief and extended supply outages or unscheduled dropouts, and development of spin-based research synthesis and optimization routines compatible with existing commercial toolchains. These tools will be employed to design a hybrid post-CMOS processing unit utilizing pipelining and power-gating through state-holding properties within the datapath itself, thus eliminating checkpointing and data transfer operations

    Enabling Intelligent IoTs for Histopathology Image Analysis Using Convolutional Neural Networks

    Get PDF
    Medical imaging is an essential data source that has been leveraged worldwide in healthcare systems. In pathology, histopathology images are used for cancer diagnosis, whereas these images are very complex and their analyses by pathologists require large amounts of time and effort. On the other hand, although convolutional neural networks (CNNs) have produced near-human results in image processing tasks, their processing time is becoming longer and they need higher computational power. In this paper, we implement a quantized ResNet model on two histopathology image datasets to optimize the inference power consumption. We analyze classification accuracy, energy estimation, and hardware utilization metrics to evaluate our method. First, the original RGBcolored images are utilized for the training phase, and then compression methods such as channel reduction and sparsity are applied. Our results show an accuracy increase of 6% from RGB on 32-bit (baseline) to the optimized representation of sparsity on RGB with a lower bit-width, i.e., \u3c8:8\u3e. For energy estimation on the used CNN model, we found that the energy used in RGB color mode with 32-bit is considerably higher than the other lower bit-width and compressed color modes. Moreover, we show that lower bit-width implementations yield higher resource utilization and a lower memory bottleneck ratio. This work is suitable for inference on energy-limited devices, which are increasingly being used in the Internet of Things (IoT) systems that facilitate healthcare systems

    Ultra-Fast, High-Performance 8x8 Approximate Multipliers by a New Multicolumn 3,3:2 Inexact Compressor and its Derivatives

    Full text link
    Multiplier, as a key role in many different applications, is a time-consuming, energy-intensive computation block. Approximate computing is a practical design paradigm that attempts to improve hardware efficacy while keeping computation quality satisfactory. A novel multicolumn 3,3:2 inexact compressor is presented in this paper. It takes three partial products from two adjacent columns each for rapid partial product reduction. The proposed inexact compressor and its derivates enable us to design a high-speed approximate multiplier. Then, another ultra-fast, high-efficient approximate multiplier is achieved utilizing a systematic truncation strategy. The proposed multipliers accumulate partial products in only two stages, one fewer stage than other approximate multipliers in the literature. Implementation results by Synopsys Design Compiler and 45 nm technology node demonstrates nearly 11.11% higher speed for the second proposed design over the fastest existing approximate multiplier. Furthermore, the new approximate multipliers are applied to the image processing application of image sharpening, and their performance in this application is highly satisfactory. It is shown in this paper that the error pattern of an approximate multiplier, in addition to the mean error distance and error rate, has a direct effect on the outcomes of the image processing application.Comment: 21 Pages, 18 Figures, 6 Table

    Nv-Clustering: Normally-Off Computing Using Non-Volatile Datapaths

    No full text
    With technology downscaling, static power dissipation presents a crucial challenge to multicore, many-core, and System-on-Chip (SoC) architectures due to the increased role of leakage currents in overall energy consumption and the need to support power-gating schemes. Herein, a non-Volatile (NV) flip-flop design approach, referred to as NV Clustering, is developed to realize middleware-transparent intermittent computing. First, a Logic-Embedded Flip-Flop (LE-FF) is developed to realize rudimentary Boolean logic functions along with an inherent state-holding capability within a compact footprint. Second, the NV-Clustering synthesis procedure and corresponding tool module are utilized to instantiate the LE-FF library cells within conventional Register Transfer Language (RTL) specifications. This selectively clusters together logic and NV state-holding functionality, based on energy and area minimization criteria. NV-Clustering is applied to a wide range of benchmarks including ISCAS-89, MCNS, and ITC-99 computational circuits using a LE-FF based on the Spin Hall Effect (SHE)-assisted Spin Transfer Torque (STT) Magnetic Tunnel Junction (MTJ). Simulation results validate functionality and power dissipation, area, and delay benefits. For instance, results for ISCAS-89 benchmarks indicate 15 percent area reduction on average, up to 22 percent reduction in energy consumption, and up to 14 percent reduction in delay as compared to alternative NV-FF based designs, as evaluated via SPICE simulation at the 45-nm technology node

    Energy-Efficient And Process-Variation-Resilient Write Circuit Schemes For Spin Hall Effect Mram Device

    No full text
    In this paper, various energy-efficient write schemes are proposed for switching operation of spin hall effect (SHE)-based magnetic tunnel junctions (MTJs). A transmission gate (TG)-based write scheme is proposed, which provides a symmetric and energy-efficient switching behavior. We have modeled an SHE-MTJ using precise physics equations, and then leveraged the model in SPICE circuit simulator to verify the functionality of our designs. Simulation results show the TG-based write scheme advantages in terms of device count and switching energy. In particular, it can operate at 12% higher clock frequency while realizing at least 13% reduction in energy consumption compared to the most energy-efficient write circuits. We have analyzed the performance of the implemented write circuits in presence of process variation (PV) in the transistors\u27 threshold voltage and SHE-MTJ dimensions. Results show that the proposed TG-based design is the second most PV-resilient write circuit scheme for SHE-MTJs among the implemented designs. Finally, we have proposed the 1TG-1T-1R SHE-based magnetic random access memory (MRAM) bit cell based on the TG-based write circuit. Comparisons with several of the most energy-efficient and variation-resilient SHE-MRAM cells indicate that 1TG-1T-1R delivers reduced energy consumption with 43.9% and 10.7% energy-delay product improvement, while incurring low area overhead

    Logic-Encrypted Synthesis For Energy-Harvesting-Powered Spintronic-Embedded Datapath Design

    No full text
    The objectives of advancing secure, intermittency-tolerant, and energy-aware logic datapaths are addressed herein by developing a spin-based design methodology and its corresponding synthesis steps. The approach selectively-inserts Non-Volatile (NV) Polymorphic Gates (PGs) to realize datapaths which are suitable for intrinsic operation in Energy-Harvesting-Powered (EHP) devices. Spin Hall Effect (SHE)-based Magnetic Tunnel (MTJs) are utilized to design NV-PGs, which are combined within a Flip-Flop (FF) circuit to develop a PG-FF realizing Boolean logic functions with inherent state-holding capability. The reconfigurability of PGs is leveraged for logic-encryption to enhance the security of the developed intermittency-resilient circuits, which are applied to ISCAS-89, MCNS, and ITC-99 benchmarks. The results obtained indicate that the PG-FF based design can achieve up to 7.1% and 13.6% improvements in terms of area and Power Delay Product (PDP), respectively, compared to NV-FF based methodologies that replace the CMOS-based FFs with NV-FFs. Further PDP improvements are achieved by using low-energy barrier SHE-MTJ devices within the PG-FF circuit. SHE-MTJs with 30kT energy exhibit 40.5% reduction in PDP at the cost of lower retention times in the range of minutes, which is still sufficient to achieve forward progress in EHP devices having more than hundreds of power-on and power-off cycles per minute

    Design And Evaluation Of An Ultra-Area-Efficient Fault-Tolerant Qca Full Adder

    No full text
    Quantum-dot cellular automata (QCA) has been studied extensively as a promising switching technology at nanoscale level. Despite several potential advantages of QCA-based designs over conventional CMOS logic, some deposition defects are probable to occur in QCA-based systems which have necessitated fault-tolerant structures. Whereas binary adders are among the most frequently-used components in digital systems, this work targets designing a highly-optimized robust full adder in a QCA framework. Results demonstrate the superiority of the proposed full adder in terms of latency, complexity and area with respect to previous full adder designs. Further, the functionality and the defect tolerance of the proposed full adder in the presence of QCA deposition faults are studied. The functionality and correctness of our design is confirmed using high-level synthesis, which is followed by delineating its normal and faulty behavior using a Probabilistic Transfer Matrix (PTM) method. The related waveforms which verify the robustness of the proposed designs are discussed via generation using the QCADesigner simulation tool
    corecore